Extreme-Scale De Novo Genome Assembly
نویسندگان
چکیده
De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and the large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.
منابع مشابه
Clustering of Short Read Sequences for de novo Transcriptome Assembly
Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...
متن کاملWhole Genome Amplification and De novo Assembly of Single Bacterial Cells
BACKGROUND Single-cell genome sequencing has the potential to allow the in-depth exploration of the vast genetic diversity found in uncultured microbes. We used the marine cyanobacterium Prochlorococcus as a model system for addressing important challenges facing high-throughput whole genome amplification (WGA) and complete genome sequencing of individual cells. METHODOLOGY/PRINCIPAL FINDINGS...
متن کاملDe Novo Assembly with the Genome Analyzer
The utility of the Illumina Genome Analyzer for a broad range of applications is evidenced by an amazing rate of publication by the research community1. In addition to human-scale resequencing2 –4 and other applications requiring a reference genome, researchers are performing de novo assembly. Several important organisms have been sequenced for the first time as a direct result of the consisten...
متن کاملSuccinct data structures for assembling large genomes
MOTIVATION Second-generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human biomedicine, it offers a direct way of observing both large-scale structural variation and f...
متن کاملGenome Sequence of the Extreme Obligate Alkaliphile Bacillus marmarensis Strain DSM 21297
Bacillus marmarensis strain DSM 21297 is an extreme obligate alkaliphile able to grow in medium up to pH 12.5. A whole-shotgun strategy and de novo assembly led to the generation of a 4-Mbp genome of this strain. The genome features alkaliphilic adaptations and pathways for n-butanol and poly(3-hydroxybutyrate) synthesis.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.11147 شماره
صفحات -
تاریخ انتشار 2017